智能论文笔记

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

MammoDL: Mammographic Breast Density Estimation using Federated Learning

Keshava Katti , Ramya Muthukrishnan , Angelina Heyler , Sarthak Pati , Aprupa Alahari , Michael Sanborn , Emily F. Conant , Christopher Scott , Stacey Winham , Celine Vachon

分类：计算机视觉 | 机器学习

2022-06-11

评估成像中的乳腺癌风险仍然是一个主观过程，在该过程中，放射科医生采用计算机辅助检测（CAD）系统或定性视觉评估来估计乳房密度（PD）。更先进的机器学习（ML）模型已成为量化早期，准确和公平诊断的乳腺癌风险的最有希望的方法，但是医学研究中的这种模型通常仅限于小型单一机构数据。由于患者人口统计和成像特征可能在成像站点之间有很大差异，因此在单机构数据中训练的模型往往不会很好地概括。为了应对这个问题，提出了Mammodl，这是一种开源软件工具，利用UNET体系结构来准确估计乳腺PD和数字乳房X线摄影（DM）的复杂性。通过开放的联合学习（OpenFL）库，该解决方案可以在多个机构的数据集上进行安全培训。 Mammodl是一个比其前任更精简，更灵活的模型，由于对更大，更具代表性的数据集的支持培训，因此具有改进的概括。

translated by 谷歌翻译

Generative appearance replay for continual unsupervised domain adaptation

Boqi Chen , Kevin Thandiackal , Pushpak Pati , Orcun Goksel

分类：计算机视觉 | 人工智能

2023-01-03

Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.

translated by 谷歌翻译

MixupE: Understanding and Improving Mixup from Directional Derivative Perspective

Vikas Verma , Sarthak Mittal , Wai Hoh Tang , Hieu Pham , Juho Kannala , Yoshua Bengio , Arno Solin , Kenji Kawaguchi

分类：机器学习 | 计算机视觉

2022-12-27

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. We then propose a new method to improve Mixup based on the novel insight. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across various datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.

translated by 谷歌翻译

Is Bio-Inspired Learning Better than Backprop? Benchmarking Bio Learning vs. Backprop

Manas Gupta , Sarthak Ketanbhai Modi , Hang Zhang , Joon Hei Lee , Joo Hwee Lim

分类：机器学习

2022-12-09

Bio-inspired learning has been gaining popularity recently given that Backpropagation (BP) is not considered biologically plausible. Many algorithms have been proposed in the literature which are all more biologically plausible than BP. However, apart from overcoming the biological implausibility of BP, a strong motivation for using Bio-inspired algorithms remains lacking. In this study, we undertake a holistic comparison of BP vs. multiple Bio-inspired algorithms to answer the question of whether Bio-learning offers additional benefits over BP, rather than just biological plausibility. We test Bio-algorithms under different design choices such as access to only partial training data, resource constraints in terms of the number of training epochs, sparsification of the neural network parameters and addition of noise to input samples. Through these experiments, we notably find two key advantages of Bio-algorithms over BP. Firstly, Bio-algorithms perform much better than BP when the entire training dataset is not supplied. Four of the five Bio-algorithms tested outperform BP by upto 5% accuracy when only 20% of the training dataset is available. Secondly, even when the full dataset is available, Bio-algorithms learn much quicker and converge to a stable accuracy in far lesser training epochs than BP. Hebbian learning, specifically, is able to learn in just 5 epochs compared to around 100 epochs required by BP. These insights present practical reasons for utilising Bio-learning rather than just its biological plausibility and also point towards interesting new directions for future work on Bio-learning.

translated by 谷歌翻译

Estimation of Appearance and Occupancy Information in Birds Eye View from Surround Monocular Images

Sarthak Sharma , Unnikrishnan R. Nair , Udit Singh Parihar , Midhun Menon S , Srikanth Vidapanakal

分类：计算机视觉 | 机器人

2022-11-08

Autonomous driving requires efficient reasoning about the location and appearance of the different agents in the scene, which aids in downstream tasks such as object detection, object tracking, and path planning. The past few years have witnessed a surge in approaches that combine the different taskbased modules of the classic self-driving stack into an End-toEnd(E2E) trainable learning system. These approaches replace perception, prediction, and sensor fusion modules with a single contiguous module with shared latent space embedding, from which one extracts a human-interpretable representation of the scene. One of the most popular representations is the Birds-eye View (BEV), which expresses the location of different traffic participants in the ego vehicle frame from a top-down view. However, a BEV does not capture the chromatic appearance information of the participants. To overcome this limitation, we propose a novel representation that captures various traffic participants appearance and occupancy information from an array of monocular cameras covering 360 deg field of view (FOV). We use a learned image embedding of all camera images to generate a BEV of the scene at any instant that captures both appearance and occupancy of the scene, which can aid in downstream tasks such as object tracking and executing language-based commands. We test the efficacy of our approach on synthetic dataset generated from CARLA. The code, data set, and results can be found at https://rebrand.ly/APP OCC-results.

translated by 谷歌翻译

Comparative analysis of segmentation and generative models for fingerprint retrieval task

Megh Patel , Devarsh Patel , Sarthak Patel

分类：计算机视觉 | 机器学习

2022-09-13

像指纹一样的生物识别验证已成为用户身份验证和验证现代技术不可或缺的一部分。它在我们大多数人所意识到的更多方面普遍存在。但是，如果手指脏，湿，受伤或传感器故障时，这些指纹图像的质量会恶化。因此，通过去除噪声并将其重组以重组图像对于其身份验证至关重要，从而解除原始指纹。因此，本文提出了一种深入学习方法，以使用生成（GAN）和细分模型来解决这些问题。在Pix2Pixgan和Cyclean（生成模型）以及U-NET（分割模型）之间进行了定性和定量比较。为了训练该模型，我们创建了自己的数据集NFD-精心设计的嘈杂的指纹数据集，具有不同的背景以及某些图像中的划痕，以使其更现实和强大。在我们的研究中，U-NET模型的性能比GAN网络更好

translated by 谷歌翻译

Application of image-to-image translation in improving pedestrian detection

Devarsh Patel , Sarthak Patel , Megh Patel

分类：计算机视觉 | 人工智能 | 机器学习

2022-09-08

缺乏有效的目标区域使得在低强度光（包括行人识别和图像到图像翻译）中执行多个视觉功能变得困难。在这种情况下，通过使用红外和可见图像的联合使用来积累高质量的信息，即使在弱光下也可以检测行人。在这项研究中，我们将在LLVIP数据集上使用先进的深度学习模型，例如Pix2Pixgan和Yolov7，其中包含可见的信号图像对，用于低光视觉。该数据集包含33672张图像，大多数图像都是在黑暗场景中捕获的，与时间和位置紧密同步。

translated by 谷歌翻译

EvolvingBehavior: Towards Co-Creative Evolution of Behavior Trees for Game NPCs

Nathan Partlan , Luis Soto , Jim Howe , Sarthak Shrivastava , Magy Seif El-Nasr , Stacy Marsella

分类：神经与进化计算 | 人工智能

2022-09-01

为了协助游戏开发人员制作游戏NPC，我们展示了EvolvingBehavior，这是一种新颖的工具，用于基因编程，以在不真实的引擎4中发展行为树4.在初步评估中，我们将演变的行为与我们的研究人员设计的手工制作的树木和随机的树木进行了比较 - 在3D生存游戏中种植的树木。我们发现，在这种情况下，EvolvingBehavior能够产生行为，以实现设计师的目标。最后，我们讨论了共同创造游戏AI设计工具的探索的含义和未来途径，以及行为树进化的挑战和困难。

translated by 谷歌翻译

Interpretable Multimodal Emotion Recognition using Hybrid Fusion of Speech and Image Data

Puneet Kumar , Sarthak Malik , Balasubramanian Raman

分类：计算机视觉

2022-08-25

本文提出了一个基于混合融合的多模式情感识别系统，该系统将语音话语和相应图像描绘的情绪分类为离散类。已经开发了一种新的可解释性技术，以确定重要的语音和图像特征，从而预测特定的情感类别。拟议的系统的体系结构是通过大量消融研究确定的。它融合了语音和图像特征，然后结合了语音，图像和中间融合输出。提出的可解释性技术结合了划分和征服方法，以计算表示每个语音和图像特征的重要性的刻薄值。我们还构建了一个大规模数据集（IIT-R较小的数据集），包括语音话语，相应的图像和班级标签，即“愤怒”，“快乐”，“仇恨”和“悲伤”。拟议的系统已达到83.29％的情绪识别精度。提出的系统的增强性能提倡利用多种模式中的互补信息来识别情绪的重要性。

translated by 谷歌翻译

HTML版本